-
-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculate psi values on a sparse grid, to accelerate multiple calculations. #496
Conversation
By the way you need cython 3.0 to compile it, which is still an alpha version, so you can install it with |
I'll have a look! Thanks for this. I am not too happy about the alpha dependency... What feature is it that you need? Can we use older cython and update when 3 comes out? |
Cython 3.0 has been in alpha for like 2 years I think and it is now in its 11th iteration. I think it is pretty safe to use. Not using it is like using siesta 4.0 instead of 4.1b4 Anyway this is not a dependency that you will impose on users, since you will provide the sources already compiled. So I don't think it's that dramatic. The problem is that fused types in the pure python syntax don't work in 0.29, so I would have to rewrite everything with cython syntax. |
I must admit, if scipy isn't doing this, then I would also like to wait... I would like to follow scipy here... Also given that there are so many bugs listed in the 3.0 road map of cython. While I partially agree that it should work, I think it is easier for many of the partial devs to not rely on alpha releases of cython. Sorry. |
Ok, it now works with 0.29.32. I didn't know you could mix both syntaxes. Since you can, this is an easy fix that doesn't annoy me too much and is easy to revert. I have also created an issue in cython to see if they can solve it in 0.29: cython/cython#5085 |
56a7094
to
d93d027
Compare
This pull request introduces 2 alerts when merging d93d027 into c5f7d5d - view on LGTM.com new alerts:
|
I've done some changes. I realized that if I stored the sparse grid as a CSR matrix where rows are the elements of the grid (if it is a 3D grid, it has been raveled to 1D) and the columns are the extra dimension it would be benefitial because:
The structure is cleaner now, so generic routines can be implemented for useful things and they don't look ultra-complex. For example I implemented:
I also added lots of tests, so I now feel much more confident about its correctness and it will be safer to do modifications on it. There is one main problem though: I don't know how to make the functionality here play nicely with the current API for Finally here are some plots of using multiple density matrices in the same product computation vs using them one by one. The plots are for different grid shapes, for a system with 3000 orbitals: Overall I think it's very nice because it has allowed me to calculate |
dc59856
to
d46f65e
Compare
This pull request introduces 1 alert when merging d46f65e into c5f7d5d - view on LGTM.com new alerts:
|
d46f65e
to
a40b756
Compare
a40b756
to
8bf2456
Compare
I'll have another look when you say so, thanks!!!! |
Great! I'll try to amend things as we discussed. For now I was just making the branch functional again because I needed to use it. |
f01cf43
to
530324c
Compare
If this went in, I'm not sure it is more expensive to compute the LDOS directly by squaring wavefunctions. It would be something like this: Where:
You end up with a matrix LDOS (point grids X energies). I'm not sure this is more expensive than the DM way, specially in python since these are simple matrix multiplications that are already implemented in Also, C can contain a third dimension for the k points, and doing the same operation you end up with |
I noticed that I know this is not the most efficient way, it is faster use fourier transforms. But it was easy to add and since there is no way currently of computing the overlap in sisl I just added it here 😅 |
I also leave this here. A working example of how to compute the LDOS using the proposal that I explained two comments above. Even though the occupations could have been a sparse matrix and they aren't, it was surprisingly fast. It allowed me to compute a high resolution full LDOS spectrum of a DZP water molecule in 1s. import sisl
import numpy as np
from pathlib import Path
import plotly.express as px
# Get fdf
fdf = sisl.get_sile("waterDZP.fdf")
# Read hamiltonian and get geometry
H = fdf.read_hamiltonian()
geom = H.geometry
# Compute eigenstates
eig = H.eigenstate()
# Compute psi values on a very fine grid
shape = (100, 300, 100)
psi_values = geom.orbital_values(shape, pbc=True)
# Compute wavefunctions on the grid
VC = psi_values.reduce_orbitals(eig.state.T)
# Square them
VC_2 = VC ** 2
# Get the occupations matrix (wavefunction X energy point)
gaussian = sisl.get_distribution("gaussian", smearing=0.2)
E = np.linspace(eig.eig[0] - 2, eig.eig[-1] + 2, 800)
N = np.array([gaussian(eig.eig - point_E) for point_E in E]).T
# Sum wavefunctions over X and Z, then multiply by occupations to get the LDOS
# on the Y direction
LDOS_y = VC_2.sum(axis=0).sum(axis=1) @ N
# Plot the LDOS
fig = px.imshow(
LDOS_y.T,
x=np.linspace(0, geom.cell[1,1], LDOS_y.shape[0]),
y=E,
aspect="data",
).update_layout(
title="LDOS of DZP water molecule",
height=900,
yaxis_title="Energy [eV]",
xaxis_title="Y axis [Ang]"
)
# Plot some lines to see where the atoms are.
colors = ["red", "gray", "gray"]
for i, y in enumerate(geom.xyz[:, 1]):
fig.add_vline(x=y, line_color=colors[i])
fig :) |
367ca9c
to
781894d
Compare
e5c33df
to
990f9df
Compare
This is now ready. As we discussed a long time ago, the only user-visible change is the addition of a
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #496 +/- ##
==========================================
- Coverage 87.49% 87.07% -0.42%
==========================================
Files 397 400 +3
Lines 50972 51651 +679
==========================================
+ Hits 44597 44975 +378
- Misses 6375 6676 +301 ☔ View full report in Codecov by Sentry. |
I'll have a look at this in the coming weeks, shouldn't we move this to pure-python mode? Or? |
Yes, I can convert it |
row_value: cython.numeric | ||
|
||
for i in range(nrows): | ||
reduced_i = i // reduce_factor |
Check warning
Code scanning / CodeQL
Variable defined multiple times Warning
redefined
This assignment to 'reduced_i' is unnecessary as it is
redefined
|
||
remaining = index | ||
for iaxis in range(2, -1, -1): | ||
unraveled[iaxis] = remaining % grid_shape[iaxis] |
Check failure
Code scanning / CodeQL
Potentially uninitialized local variable Error
remaining = remaining // grid_shape[iaxis] | ||
|
||
for iaxis in range(3): | ||
new_grid_shape[iaxis] = grid_shape[new_order[iaxis]] |
Check failure
Code scanning / CodeQL
Potentially uninitialized local variable Error
|
||
for iaxis in range(3): | ||
new_grid_shape[iaxis] = grid_shape[new_order[iaxis]] | ||
new_unraveled[iaxis] = unraveled[new_order[iaxis]] |
Check failure
Code scanning / CodeQL
Potentially uninitialized local variable Error
|
||
for iaxis in range(3): | ||
new_grid_shape[iaxis] = grid_shape[new_order[iaxis]] | ||
new_unraveled[iaxis] = unraveled[new_order[iaxis]] |
Check failure
Code scanning / CodeQL
Potentially uninitialized local variable Error
new_unraveled[iaxis] = unraveled[new_order[iaxis]] | ||
|
||
new_raveled = ( | ||
new_unraveled[2] |
Check failure
Code scanning / CodeQL
Potentially uninitialized local variable Error
|
||
new_raveled = ( | ||
new_unraveled[2] | ||
+ new_unraveled[1] * new_grid_shape[2] |
Check failure
Code scanning / CodeQL
Potentially uninitialized local variable Error
There is now the possibility to pre-compute orbital values to calculate the density. This results in much faster calculations, although it uses more memory.
d1b6f76
to
220b2ec
Compare
Signed-off-by: Nick Papior <[email protected]>
Signed-off-by: Nick Papior <[email protected]>
I will merge this, but note that the methods can and probably will change at will in the future. There are some design choices that needs some coalescing. Thanks! |
Thank youuu!!! Yes, that's why everything is private except for the |
You mean the |
What do you mean? How would you do it if there are two methods that do the same thing? One of them demands more computation and the other demands more memory. |
Ah, that aspect, yeah, that is the problem of this method, ideally it should default to remove the memory after use. Exactly how this should work is not clear at this moment. So, for now I guess this is ok. |
What do you mean "after use"? Memory is cleared once you have finished computing all the products. This is unavoidable if you don't want to recompute the orbital values each time. |
sorry, I had pre-assumed the orbital values were stored on the |
Ah, no,
Yes, but that would be much more complicated. I have tested some big systems and there are no problems with memory. I guess if you run out of memory you can just use the direct method. That's why I left the option to use one or the other. |
Should I add the speed up to the changelog? |
I am still cautious on promoting these methods ( |
Do you have an example where the density is wrong? In this PR the density matrix is translated into the unit cell to compute the density, so with that most issues should be solved. |
Well, that might also cause errors because only the coordinates are translated, not the DM elements, and so there could be wrong-doings. This is my concern at least. |
Nono, the matrix elements are translated as well. At the beginning of the density method there is a line like dm = self.translate2uc() |
see, I have no idea what I am doing ;-D |
I just recall I had some initial issues where i wasn't fully sure whether everything was correct. So my mind could play tricks. |
I added this line on this PR, and I saw that it fixed some corner cases. That's why I say that maybe things are solved in the density now. |
This PR addreses #493.
Concept and implementation details
The basic idea is to compute orbital values and store them in a sparse data structure. For now, the orbital values are computed in a new method -
Geometry.orbital_values
- and stored in a new class,SparseGrid
. I didn't think too much on where to put the functionalities, so feel free to reorganize it however you think it fits best.Once you have the
SparseGrid
data structure, you can call itsproject_values
andproject_products
methods, which, through cython implementations, reduce the orbitals dimension and therefore project the values on a normalsisl.Grid
. These routines accept coeficient arrays, which determine the weight of each value/product to be added according to the orbital indices from which it came. This generalization is then used to compute wavefunctions (usingproject_values
with the coefficients being the eigenstate coefficients), and electronic density (usingproject_products
with the coefficients being the density matrix).Some considerations about these two cython routines:
project_products
: It loops over grid points to facilitate finding the overlaping orbitals. This makes it trivially easy to parallelize I would say, because there are no race conditions or any other problem. In principle, it should be as simple as changing therange
call by cython'sprange
and compiling with openmp support, but I haven't tried.project_products
uses a dense coefficients matrix, because accessing a sparse one using indices is painfully slow if you put it inside a loop. Perhaps a smarter implementation could be added in the future to manage sparse matrices in case memory is a problem. But for now I don't see it as a problem, even for huge DFT systems.project_values
: In this case, there was no need to loop over some specific quantity, so it just loops over values and adds them as they come. This avoids the need for sorting theSparseGrid
, but probably makes parallelization more difficult, since multiple iterations modify the same item of the grid. This is easy to solve by looping over grid points as inproject_products
, as long as the benefits are bigger than the cost of sorting.SparseGrid
for something else (a path or a simple collection of points, then it would be calledSparseRealSpace
I guess), you can use these same routines out of the box. This opens the door for adding otherRealSpace
classes insisl
, instead of having to always go through grids, which sometimes are unnecessarily expensive for the goals of the user.Performance for computing the density
First, I needed to check that the implementation gives correct results. I took some SIESTA calculations that I had and compared the RHO file returned by SIESTA with the density calculated by this implementation using the DM file, in the exact same grid. Then took the absolute difference and computed histograms to check that the differences are close to 0. For both gamma point calculations and k-point calculations I get histograms like this:
Which indicates that it is basically the same electronic density (differences <
7e-7
).Here's the code used to plot differences
Now that I know that it works, I went on to check the performance. For this, I splitted the contributions of computing the orbital values and the density computation, given that one of the goals is to use the same orbital values to call
DM.density
multiple times. This is the scalability I get with grid points:So, both computations seem to scale close to linearly and are quite cheap, specially if we compare it with the current implementation, which I include here for reference using a logarithmic time scale:
For 1000 grid points in this system the current implementation takes 46s while the new one takes around 500ms, so an improvement of almost two orders of magnitude.
What is the source of this improvement? I found two main points:
grid.index(ia_atom.toSphere(ia_xyz))
) seems to be very inneficient. By changing it to the approach in the new implementation (getting a bounding box and then filtering by radius) I was able to improve the performance of the current implementation from 46s to 12s. The new method is probably a bit worse for skewed grids, where the bounding box is overestimated, but still I expect that it should be better.Here's the code used to plot the scalability
Performance for computing wavefunctions
For wavefunctions, I also checked with the differences histogram that I get the same results as with the current implementation for gamma point and k-point calculations.
The performance of the current implementation and the implementation through stored psi values is very similar (the new implementation is slightly more expensive, about 10%). This is to be expected, since the expensive part is computing the orbital values, which is done in a similar way in both cases. However the new implementation uses much more memory, since it stores the values instead of adding them on the fly. This is why I kept the current implementation as the default. Computing the wavefunction through stored psi values should only be done when you are computing multiple wavefunctions, or you have already computed the psi values for something else. If that is the case, calculating wavefunctions is of course very cheap in terms of time.
Here's the code used to compare wavefunctions with both methods
Looking forward to your thoughts on this!